Using Chinese Gigaword Corpus and Chinese Word Sketch in linguistic Research

نویسندگان

  • Jia-Fei Hong
  • Chu-Ren Huang
چکیده

We explore the possibility of deeper linguistic research based on corpus and computational linguistic tools in this paper. In particular, we adopt Chinese Word Sketch, the application of Word Sketch Engine to Chinese GigaWord Corpus, for linguistic research. We apply Chinese Sketch Engine results to deeper linguistic account such as selectional restriction and event type selection. The study is based on the comparison of two basic verbs of ingestion: chi1 ‘to eat’ and he1 ‘to drink’.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Acquisition of Linguistic Knowledge: From Sinica Corpus to Gigaword Corpus

The raison d’etre for a corpus, as it was first conceived by Francis and Kucera in 1963, was to provide a body of linguistic facts from which linguistic knowledge could be generalized, [1]. The methods of acquisition have evolved as corpus size and technology have advanced in the past 40 years. Originally corpus-based concordances assisted linguists to form generalizations. This was what Fillmo...

متن کامل

Chinese Sketch Engine and the Extraction of Grammatical Collocations

This paper introduces a new technology for collocation extraction in Chinese. Sketch Engine (Kilgarriff et al., 2004) has proven to be a very effective tool for automatic description of lexical information, including collocation extraction, based on large-scale corpus. The original work of Sketch Engine was based on BNC. We extend Sketch Engine to Chinese based on Gigaword corpus from LDC. We d...

متن کامل

Uniform and Effective Tagging of a Heterogeneous Giga-word Corpus

Tagging as the most crucial annotation of language resources can still be challenging when the corpus size is big and when the corpus data is not homogeneous. The Chinese Gigaword Corpus is confounded by both challenges. The corpus contains roughly 1.12 billion Chinese characters from two heterogeneous sources: respective news in Taiwan and in Mainland China. In other words, in addition to its ...

متن کامل

Chinese Sketch Engine and Mapping Principles: A Corpus-Based Study of Conceptual Metaphors Using the BUILDING Source Domain

The goal of this paper is to use a largescale corpus, i.e. the Gigaword Corpus via the interface of Chinese Sketch Engine, to determine underlying reasons between source and target domain pairings for conceptual metaphors, called Mapping Principles. In particular, we will employ a frequency-based collocational approach to examine metaphors that use the source domain of BUILDING in Mandarin Chin...

متن کامل

Event Selection and Coercion of Two Verbs of Ingestion: a Marvs Perspective

Event semantics in general and event type coercion in particular have been a challenging yet rewarding topic in verbal semantics (Pustejovsky, 1995). However, there have been few corpus-based empirical accounts discussing the range of event type coercions based on the lexical meanings of the verbs. In this paper, we explore the possible types of event coercions for two verbs of ingestion in Man...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006